-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rn/na-notebook #578
base: dev
Are you sure you want to change the base?
Rn/na-notebook #578
Conversation
removing some lint and adding zoo to DESCRIPTION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments so far. Will revisit.
impute_locf <- function(data) { | ||
data %>% | ||
group_by(geo_value) %>% | ||
mutate(across(where(is.numeric), ~ zoo::na.locf(.x, na.rm = FALSE), .names = "{.col}_locf")) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: use fill
from tidyr
to do this instead, to focus on a smaller set of packages. I guess we should probably include an arrange
to make sure things are arranged by time_value
within each geo_value
as well.
|
||
### NAs from merging | ||
|
||
First let's start with discussing the most common type of missing values that appeared in the context of my auxiliary signal project. When working with multiple signals each signal will likely begin recording at different times. In other words each signal's first data point $t_0$ will differ on the absolute time scale. As a result, when calling `epix_merge()` to combine multiple signals, the signals that started recording at a later point in time will have missing values for the time periods where the other signals were already recording. Here's a quick example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: let's present this as more instructional. Here, that means just tweaking language, getting rid of "that appeared in the context of my auxiliary signal project", and then softening/rewording "the most common" since it may not be the most common in general.
(But in other parts we actually need to make it the content more instructional/relevant (use real data not buggy/artificial).)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: epix_merge()
can introduce NAs both from differing min time_value
s and differing min version
s, but
- the description here is a little ambiguous about which it's referring to
- the table is showing something more like an
epi_df
Possible fix: instead of mentioning epix_merge()
at this point, we could say this is an issue with some types of joins, and then transition into the epix_merge()
example with some more discussion.
|
||
```{r latest_fn} | ||
latest <- function(x) { | ||
epix_as_of(x, max_version = max(x$versions_end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epix_as_of(x, max_version = max(x$versions_end)) | |
epix_as_of(x, x$versions_end) |
we renamed max_version
, and versions_end
is a scalar
geo_type = "state", | ||
time_values = epirange(20200220, today), | ||
geo_values = states, | ||
issues = epirange(20201130, today) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest:
move to time_values = "*", issues = epirange(12340101, today)
or some other absurdly early start issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then maybe do something else to show people the range of time values and issues in the data
Checklist
Please:
PR).
brookslogan, nmdefries.
DESCRIPTION
. Always incrementthe patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
process.
Change explanations for reviewer
Added the reviewed na-notebook. Key differences from last time: added an example and explained
complete()
, as well as added the LOCF in version example.Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch